Skip to content

api: route per-call against unified hosts#5137

Open
simonfaltum wants to merge 17 commits intomainfrom
simonfaltum/cli-api-spog-routing
Open

api: route per-call against unified hosts#5137
simonfaltum wants to merge 17 commits intomainfrom
simonfaltum/cli-api-spog-routing

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Apr 30, 2026

Why

databricks api always sent the workspace routing identifier (X-Databricks-Org-Id) when the profile had one, even when the path was an account API. On unified hosts (one host serving both workspace and account APIs) this misrouted account calls. There was also no way to explicitly route a call to the account API or override the identifier per call.

Changes

Before: routing was decided once from the profile and applied to every call.
Now: routing is decided per call from the path being requested.

  • Paths under /accounts/{id}/ are auto-detected as account-scope; the routing identifier is dropped.
  • A small hand-written list in cmd/api/paths.go carves out workspace-routed proxy APIs that happen to live under /accounts/, so they keep the identifier.
  • --account forces account-scope on a non-/accounts/ path.
  • --workspace-id <id> overrides the identifier per call. Mutually exclusive with --account.
  • ?o=<id> on the path (the SPOG URL convention used by the Databricks UI) is recognized as a per-call workspace override, so URLs pasted from the browser route correctly.
  • The CLI-only workspace_id = none sentinel is stripped before the routing decision so the literal "none" never goes on the wire.

Routing logic lives in pure functions (hasAccountSegment, extractOrgIDFromQuery, resolveOrgID, normalizeWorkspaceID, isWorkspaceProxyPath) that take primitives. The cobra RunE is a thin adapter that resolves config and calls them.

Test plan

  • go test ./cmd/api covers the helpers with table-driven cases: deny-list hits and misses, query/fragment edge cases, mutual-exclusion errors, sentinel stripping, ?o= extraction.
  • go test ./acceptance -run TestAccept/cmd/api exercises seven variants end to end against terraform and direct engines: workspace path, account path, deny-listed proxy under /accounts/, --account, --workspace-id, ?o= query, workspace_id = none. Each test asserts header presence/absence explicitly via print_requests.py | contains.py.
  • make checks

The next PR teaches `databricks api` to detect workspace-vs-account
scope per call. That decision needs a deny-list of paths under
accounts/ that the SDK builds without an account-ID slot (workspace
proxies). Hand-maintaining that list drifts from the SDK; this commit
generates it.

genpaths walks every service/*/impl.go in the pinned SDK with go/ast,
classifies each `path :=` assignment, and emits cmd/api/paths_generated.go
with a closed allowlist on account-ID source spellings. Refuses to emit
prefixes that would over-match, fails loudly on idioms it doesn't
recognize, handles var/define/assign forms and rejects compound
assignments. Hooked into ./task generate-paths and the existing
generated-files staleness gate in CI.

The generated tables are not yet referenced at runtime; the next PR
wires them in. Generated-file lint exclusions (lax rules) cover the
unused declarations until then.

Co-authored-by: Isaac
- Drop stdlib log in favor of fmt.Fprintf+os.Exit for the generator
  binary (depguard rule against the stdlib log package).
- Make the path-class switch exhaustive by listing the no-op cases.
- Lowercase the format-verb error string (staticcheck ST1005).

Co-authored-by: Isaac
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-api-spog-routing branch from 6db6f9e to a775d99 Compare April 30, 2026 09:38
@simonfaltum simonfaltum changed the title api: make databricks api host-agnostic api: generate workspace-proxy deny-list from SDK source Apr 30, 2026
Keep the unified-host routing support small for the initial PR by hand-maintaining the current workspace-proxy exceptions instead of parsing generated SDK source.
`databricks api <verb> <path>` previously bypassed the generated SDK
header logic and called client.Do directly, so on unified hosts where
workspace-vs-account routing is decided per-call it had no way to
distinguish the two. This wires up per-call detection using the
deny-list from the prior PR, plus three explicit overrides:

  --account              scope this call to the account API
  --workspace-id <id>    override the workspace routing identifier
  {account_id}           literal substituted from the active profile

Detection runs on URL.Path so query strings and fragments can't
false-match. The CLI-only WorkspaceIDNone sentinel (workspace_id =
none in .databrickscfg) is normalized to empty before the SDK's
idiomatic check sees it, so the literal "none" never goes on the
wire.

Behavior change for classic workspace profiles that have workspace_id
set: the routing identifier is now sent. Classic gateways ignore the
header so this should be benign; called out in the manual smoke plan
in case it surfaces.

Co-authored-by: Isaac
Keep the manual workspace-proxy list behind one helper so tests exercise the same path used by runtime account detection.
@simonfaltum simonfaltum changed the title api: generate workspace-proxy deny-list from SDK source api: route per-call against unified hosts Apr 30, 2026
@simonfaltum simonfaltum marked this pull request as ready for review April 30, 2026 10:32
Add the standard OS and CI/cicd replacements to acceptance/cmd/api/test.toml
and regenerate the recorded User-Agent strings to use os/[OS]. Without these,
goldens generated locally on macOS contain os/darwin and no cicd/ segment,
which fail on Linux + GitHub Actions where the SDK records os/linux ... cicd/github.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 1, 2026 06:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 1, 2026 06:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 08:34 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 08:34 — with GitHub Actions Inactive
SPOG URLs from the Databricks UI carry the workspace ID as a query param
(e.g. /api/2.2/jobs/list?o=7474644166319138). Recognize that param when
present and use it as the routing identifier so pasted URLs route
correctly without requiring --workspace-id. Precedence: --account >
--workspace-id flag > ?o= > account-path auto-detect > profile
workspace_id.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 09:25 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 09:25 — with GitHub Actions Inactive
cmd/api scripts pass POSIX paths like /api/2.0/clusters/list to the CLI.
Git Bash on Windows converts a leading "/" arg to a Windows path, so the
CLI sees /Program Files/Git/api/2.0/... and the testserver returns 404.
Set MSYS_NO_PATHCONV=1 in the parent test.toml, matching the pattern used
by cmd/workspace/export-dir-* and workspace/repos.

Also regenerate out.test.toml with the diff-friendly inline format
introduced by #5146.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 11:59 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 11:59 — with GitHub Actions Inactive
Comment thread cmd/api/api.go Outdated
// correctly without requiring --workspace-id.
orgIDQueryParam = "o"

accountIDPlaceholder = "{account_id}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it belongs in a different PR. The current impl doesn't interpolate account IDs and we don't need to start doing that (yet). Might be useful, but then we should see if we should interpolate more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I just wanted to make it easy to copy from the docs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled it out in the latest push (1ea4ffc) — {account_id} substitution and its acceptance tests are gone. Happy to do it as a focused follow-up if/when we want to expand path substitution more broadly.

Comment thread cmd/api/paths.go
"/api/2.0/preview/accounts/access-control/assignable-roles": {},
"/api/2.0/preview/accounts/access-control/rule-sets": {},
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this list is unfortunate. What happens if we tack on the ?o= regardless?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can we match on /accounts and /preview/accounts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the workspaceProxyPrefixes that makes that matching not possible afaik

@@ -0,0 +1 @@
$CLI api get /api/2.0/clusters/list --account
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is that we don't see any org ID in the request log, correct?

If so, it's worth capturing in another call here that confirms that assertion.

Easy to miss the intent otherwise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e9e0b4a. Each of the seven cmd/api acceptance tests now uses print_requests.py --get --keep | contains.py to assert the recorded request either does or does not carry X-Databricks-Org-Id (with a specific value where one is expected). The intent is now visible in the script and in output.txt, not implicit in out.requests.txt.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now dups the requests between out.requests.txt and output.txt. Can you keep one of them?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped --keep in 556b552print_requests.py now consumes out.requests.txt (deleted from the test directories) and the recorded request lives in output.txt only.

Pull this out of the per-call routing PR per review feedback. The routing
change does not need account-ID interpolation to land; if we want it later
it can be a focused follow-up alongside any other path-substitution we
decide to support.

Co-authored-by: Isaac
…citly

Each test now uses print_requests.py | contains.py to assert the
recorded request either does or does not carry the routing identifier.
The intent (header sent or not, with what value) is now visible in the
script and in output.txt, instead of being implied by the recorded
out.requests.txt artifact alone.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 13:39 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 13:39 — with GitHub Actions Inactive
@@ -0,0 +1 @@
$CLI api get /api/2.0/clusters/list --account
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now dups the requests between out.requests.txt and output.txt. Can you keep one of them?

@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:38 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:38 — with GitHub Actions Inactive
print_requests.py --keep was preserving out.requests.txt alongside the
new inline assertion in output.txt, duplicating the recorded request.
Drop --keep so the helper consumes out.requests.txt and the assertion
stays in output.txt only.

Co-authored-by: Isaac
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:41 — with GitHub Actions Inactive
@simonfaltum simonfaltum temporarily deployed to test-trigger-is May 4, 2026 14:41 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants